gh-129149: Add fast path for medium-size integers in PyLong_FromSsize_t()#129301
gh-129149: Add fast path for medium-size integers in PyLong_FromSsize_t()#129301picnixz merged 13 commits intopython:mainfrom
PyLong_FromSsize_t()#129301Conversation
Use it in PyLong_FromLong() and PyLong_FromLongLong(). This is just a refactoring and will create the same binary code.
Use it in now in PyLong_FromSsize_t(), too.
PyLong_FromSsize_tPyLong_FromSsize_t()
skirpichev
left a comment
There was a problem hiding this comment.
LGTM
Few stylistic nitpicks, feel free to ignore.
Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>
Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>
Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>
|
I like your suggestions, brings us more in sync with |
|
BTW, next time you can apply several suggestions in one shot (subscribers will get less notifications): it's possible add them in one batch (on tab "files changed"). |
Great, will do! I've done some more syncs with PYLONG_FROM_UINT, if we want to keep the diff small, we can remove (all of) them. |
|
@chris-eibl, please don't sync with the main branch, unless you want to fix file conflict or trigger a CI build. |
|
Just to reassure: this needs no further action from me atm - correct? Most probably, another core reviewer is needed since it is still labeled as "awaiting core review"? Do I have to request this? If yes, whom should I ask for a review? |
|
CC @picnixz |
picnixz
left a comment
There was a problem hiding this comment.
Out of curiosity, can we have some benchmarks?
(on a multiple of 4 spaces)
pyperformance was neutral on it. I did think of some micro benchmarks, but I've found no good candidate without writing an external c application to directly benchmark the Maybe something with ctypes would work out, too? |
picnixz
left a comment
There was a problem hiding this comment.
Can you actually tests for checking the medium-size path? as well as perhaps checks for checking SSIZE_T_MIN (namely, tests to testcapi)
|
Yeah, Especially one negative medium-size int would be fine, too? Let's change values = (0, 1, 1234, max_val)
if min_val < 0:
values += (-1, min_val)to values = (0, 1, 512, 1234, max_val)
if min_val < 0:
values += (-1, -512, -1234, min_val) |
|
The bots are failing. Seems not related to the PR. Maybe updating with the main branch can fix it - shall I give it a try? |
|
yes, the CI config changed |
|
Yeah - that fixed it :) |
|
🤖 New build scheduled with the buildbot fleet by @picnixz for commit fda2844 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F129301%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
picnixz
left a comment
There was a problem hiding this comment.
LGTM, conditioned to build bot success (one build bot is know to fail though)
|
So the 2 build bots currently failing are know to fail so I think we can safely merge this one. Just asking Victor if he wants to add something else here. |
|
Here are some benchmarks:
before: after: Details
#include <chrono>
#include <iomanip>
#include <iostream>
#include "Python.h"
#define _PY_NSMALLPOSINTS 257
#define _PY_NSMALLNEGINTS 5
const Py_ssize_t PyLong_MASK_ssize_t = static_cast<Py_ssize_t>(PyLong_MASK);
void bench(Py_ssize_t val)
{
const auto start{ std::chrono::steady_clock::now() };
for (size_t i = 0; i < 10000000; ++i)
{
PyObject* obj = PyLong_FromSsize_t(val);
Py_DECREF(obj);
}
const auto finish{ std::chrono::steady_clock::now() };
const std::chrono::duration<double> elapsed_seconds{ finish - start };
std::cout << std::setw(12) << val << ": " << std::setw(8) << elapsed_seconds.count() * 1000 << " ms\n";
}
int main()
{
Py_Initialize();
std::cout << "some small ints\n";
std::cout << std::setprecision(2) << std::fixed;
bench(-_PY_NSMALLNEGINTS);
bench(0);
bench(_PY_NSMALLPOSINTS - 1);
std::cout << "some medium ints\n";
bench(0 - PyLong_MASK_ssize_t);
bench(1000);
bench(PyLong_MASK_ssize_t - 1);
std::cout << "some bigger ints\n";
bench(0 - PyLong_MASK_ssize_t - 1);
bench(PyLong_MASK_ssize_t + 1);
Py_Finalize();
}
|
|
The benchmarks are promising. We are gaining like 10-20% so it's non-negligible. |
vstinner
left a comment
There was a problem hiding this comment.
LGTM. Interesting optimization.
|
@vstinner: shall |
Let's wait until this change PR is merged. |
|
I'll merge this by the end of the week (hence my assignment). @chris-eibl can you update the branch just to run the latest CI? TiA (I'll do it tomorrow otherwise) |
|
Thank you! |
You can reuse the same issue for this follow-up PR btw. |
…romSsize_t()` (python#129301) The implementation of `PyLong_FromLong()`, `PyLong_FromLongLong()` and `PyLong_FromSsize_t()` are now handled by a common macro `PYLONG_FROM_INT` which contains fast paths depending on the size of the integer to convert. Consequently, `PyLong_FromSsize_t()` for medium-sized integers is faster by roughly 25%. --------- Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>
Add the macro
PYLONG_FROM_INTand use it inPyLong_FromLong()andPyLong_FromLongLong().There, this is just a binary compatible refactoring.
Use it in
PyLong_FromSsize_t(), too, to get the fast path for medium-size integers.